Automatic Classification of Sentences in Dutch Laws

نویسندگان

  • Emile de Maat
  • Radboud Winkels
چکیده

The work described here builds on [1], where we presented a categorisation of norms or provisions in legislation. We claimed that the categories are characterized by the use of typical sentence structures and that this would enable automatic detection and classification. In this paper we present the results of experiments in such automatic classification of provisions. We have defined fourteen different categories of provisions, and compiled a list of 81 sentence structures for those categories from twenty Dutch laws. Based on these structures, a parser was used to classify the sentences in fifteen different Dutch laws, classifying 94% of 530 sentences correctly. It compares well with other, statistical approaches. An important improvement of our classifier will be the distinction of principal and auxiliary sentences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Emotion Classification for Interpersonal Communication

We introduce a new emotion classification task based on Leary’s Rose, a framework for interpersonal communication. We present a small dataset of 740 Dutch sentences, outline the annotation process and evaluate annotator agreement. We then evaluate the performance of several automatic classification systems when classifying individual sentences according to the four quadrants and the eight octan...

متن کامل

Automatic detection of prominence (as defined by listeners' judgements) in read aloud dutch sentences

This paper describes a first step towards the automatic classification of prominence (as defined by naive listeners). As a result of a listening experiment each word in 500 sentences was marked with a rating scale between ‘0’ (non-prominent) and ‘10’ (very prominent). These prominence labels are compared with the following acoustical features: loudness of each vowel, and F0 range and duration o...

متن کامل

Integer Linear Programming for Dutch Sentence Compression

Sentence compression is a valuable task in the framework of text summarization. In this paper we compress sentences from news articles from Dutch and Flemish newspapers written in Dutch using an integer linear programming approach. We rely on the Alpino parser available for Dutch and on the Latent Words Language Model. We demonstrate that the integer linear programming approach yields good resu...

متن کامل

Machine Learning versus Knowledge Based Classification of Legal Texts

This paper presents results of an experiment in which we used machine learning (ML) techniques to classify sentences in Dutch legislation. These results are compared to the results of a pattern-based classifier. Overall, the ML classifier performs as accurate (>90%) as the pattern based one, but seems to generalize worse to new laws. Given these results, the pattern based approach is to be pref...

متن کامل

Sentence Compression for Dutch Using Integer Linear Programming

Sentence compression is a valuable task in the framework of text summarization. In this paper we compress sentences from news articles taken from Dutch and Flemish newspapers using an integer linear programming approach. We rely on the Alpino parser available for Dutch and on the Latent Words Language Model. We demonstrate that the integer linear programming approach yields good results for com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008